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Abstract 

In this paper we introduce a new multiparametric technique that attempts to tackle 
simultaneously the problems of composition determination and hadronic interaction 
uncertainty. Employing simulations of a real world detector under its planned op- 
erational conditions, and disregarding systematics, we can asses that the present 
technique should be able to determine the composition of a binary mixture of p and 
Fe with a statistical confidence of few percent, in a way that is independent of the 
assumed hadronic interaction model. Moreover, the combination of real data with 
the tools developed and presented here should give an indication of the reliability 
of the various hadronic interaction models in current use in the area. We center our 
study in the region of the ankle, where composition carries critical astrophysical 
information, and use two main parameters: the number of muons at 600 m from 
the shower axis and the depth of the shower maximum obtained from the hybrid 
operation of the planned muon counters and high elevation fluorescence telescopes 
of the AMIGA and HEAT Auger enhancements. 
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1 Introduction 



The cosmic ray energy spectrum, in the high energy region above 10 17 eV, 
presents two main features observed by several experiments, the second knee, 
observed at 4 x 10 17 eV [If2f5P] and the ankle. There is evidence of a fourth 
feature situated at the highest observed energies, consistent with the so-called 
GZK suppression [5|6] , which would be caused by the interaction of the ultra- 
high energy protons with the photons of the cosmic microwave background 
radiation (CMBR) [7115] . For the case of heavier nuclei a similar effect is ex- 
pected because of their interaction with photons from the infrared and mi- 
crowave backgrounds [9]. 

The origin of the second knee is still unclear. It has been interpreted as the 
end of the efficiency of the acceleration in Galactic supernova remnant shock 
waves [10], a change in the diffusion regime in our galaxy [TTfT2"] or even the 
transition between the Galactic and extragalactic components of the cosmic 
rays [13]. 

The ankle is a broader feature. It has been observed by Fly's Eye |2J, Haverah 
Park [Uj, Yakutsk [3], HiRes [I] and Auger [5] in Hybrid mode at approxi- 
mately the same energy, ~ 3 x 10 18 eV. The origin of the ankle is also unknown. 
It can be interpreted as the transition between the Galactic and extragalactic 
components [15J or the result of pair production by extragalactic protons after 
the interaction with photons of the CMBR during propagation [ToTfTT] . 

There are three main models of the Galactic-extragalactic transition. The first 
is the mixed composition model [15], in which extragalactic sources inject a 
spectrum of masses similar to lower energy Galactic cosmic rays and for which 
the transition takes place at the ankle. The second is the ankle model [18], a 
two-component transition from Galactic iron nuclei to extragalactic protons 
at the ankle energy. The third is the dip model [13], in which the ankle is due 
to pair production of extragalactic protons that interact with the photons of 
the CMBR (in this scenario the transition occurs at the second knee). 

In order to rule out or substantiate any of those models, additional infor- 
mation is necessary besides the energy spectrum shape and absolute inten- 
sity. Detailed measurements of the composition as a function of energy, while 
not sufficient, would be extremely valuable to break the present degeneracy 
among competing models for the Galactic-extragalactic transition [T9|20] . Fur- 
thermore, this kind of information could help to determine what the highest 
energy accelerators in the Galaxy are and provide indicators of the kind and 
level of magnetohydrodynamic turbulence present in the intergalactic medium 
traversed by the lowest energy cosmic ray particles [TO] . 

Several experiments have measured the cosmic ray composition in the region 
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where the transition takes place. Nevertheless, large discrepancies exist be- 
tween different experiments and experimental techniques [21] . One of the main 
reasons behind the plurality of sometimes contradicting results is that compo- 
sition is determined by comparing experimental data with numerical shower 
simulations. These simulations include models for the relevant hadronic inter- 
actions which are extrapolations, over several orders of magnitude in center of 
mass, of accelerator data to cosmic ray energies. This is a source of consider- 
able uncertainty which is confirmed, to a certain extent, by the fact that there 
is experimental evidence of a deficit in muon content of simulated showers 
with respect to real data [2"2"] . 

In this paper we introduce a new statistical method to test the compatibil- 
ity of the high energy hadronic interaction models and real data. In fact, the 
hadronic interaction model is assumed to be the main systematic uncertainty 
in the present analysis. Throughout our analysis, we consider QGSJET-II 
[23"f2"l] and Sibyll 2.1 [25]. Our method allows us to verify whether the ex- 
perimental data are compatible with the hadronic models under consideration 
and, if so, to estimate the composition. 

Although this new technique is of general applicability, we study its potential 
in the context of AMIGA (Auger Muons and Infill for the Ground Array) [26] 
and HEAT (High Elevation Auger Telescopes) [27], the lower energy exten- 
sions of the Southern Pierre Auger Observatory. These two enhancements will 
extend the energy range down to 10 17 eV, encompassing the second knee and 
ankle region where the Galactic-extragalactic transition takes place. 

The most relevant mass sensitive parameters that will be obtained from AMIGA 
and HEAT are the number of muons at 600 m from the shower core, iV M (600), 
and the atmospheric depth of maximum shower development, A max , respec- 
tively. Consequently, we develop our statistical technique by using mainly 
this pair of parameters. Combinations of ^(600) with parameters from the 
Cherenkov detectors like the slope of the lateral distribution function, rise-time 
of the signals and curvature radius are also studied. 

Note that our technique should also be applicable to Telescope Array and its 
low energy extension [2"51T2l?] , which have hybrid capabilities in the region of the 
ankle and also plan to include muon detectors [29]l30] . The quoted expected 
error for X max is ~ 20 g cm -2 [30], which is comparable to the value estimated 
for HEAT. 

^(600) is nearly linearly dependent on energy; therefore, its use as a compo- 
sition estimator requires, ideally, an independent determination of the shower 
energy. This is not a problem in the case of the Auger enhancements, where 
the same strategy of energy calibration as with the baseline design can be 
used. Hybrid events from the AMIGA-HEAT detector provide a calibration 
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for S(ro), the lateral distribution function value at a fixed distance tq ~ 600 
m, as measured by the AMIGA surface array of water Cherenkov detectors. 
The same procedure should be, in principle, applicable to other detectors with 
hybrid capability like Telescope Array. This is so because 7^(600) and energy 
are correlated through Sgoo an d the latter also receives an important contri- 
bution from the electromagnetic lateral distribution function, while iV M (600) 
is directly used as a composition parameter. Therefore, at energies of few 10 18 
eV or smaller, under the assumption of a binary mixture of proton and iron, 
the separation in A^(600) due to composition alone at fixed energy is larger 
than the separation in S 6 oo due to composition at the same energy. This can 
be clearly seen, for example, in figures 12. a and 12. b of Ref. [31], where the 
merit factors (or discrimination power) for different composition indicators 
are shown as a function of energy for two zenith angles. The merit factor is, 
basically, the separation between the distribution functions of each parameter 
at a given energy normalized by the combined dispersion of both distributions. 
It can be seen that, at a fixed energy, the separation between the distribu- 
tions of iV M (600) for proton and iron, the main composition indicator, is much 
larger than the separation between the corresponding distributions of 5*600; 
the energy estimator. Consequently, the fact that A^(600) is correlated simul- 
taneously to energy and composition does not inhibit its use as a composition 
parameter. 

Section 13.31 shows our main results. In particular, it is shown there that, by 
working on a plane defined by two parameters derived from A max and iV M 
data, an estimation of the cosmic ray composition can be obtained that is 
reasonably independent of the uncertainties related to the underlying hadronic 
interaction model. Furthermore, we demonstrate that rather small samples of 
events at a given reconstructed energy, compatible with the level of statistic 
expected from detectors currently under construction that will operate in the 
ankle region, are enough to this end. Additionally to the determination of 
composition, given two possible interaction models and an observed data set, 
the technique can be used to assess the compatibility of these models with the 
experimental data. 



2 Abundance Estimator 

In order to develop a statistical method to infer the composition of the cosmic 
rays (i.e. the abundance of a given primary type), let us consider two possible 
types of primaries, A = a,b, and samples of size N = N a + ]V&, where N a and 
Nj, are the number of events corresponding to type a and b, respectively. From 
each event of an individual sample it is possible to extract several observable 
parameters sensitive to the primary mass. Therefore, for a given mass sensitive 
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parameter q we define, 

1 N 1 

= ]y X! ^a(<Zi) = ^7 



i=l 



N 



N a N b 

i=l i=l 



where g 4 A are AOi random variables distributed as /a(<?) and 



which is the probability that an event is of type a for a given q, assuming no 
prior knowledge of the primary type. Note that we restrict our analysis to the 
case in which the cosmic rays are the superposition of two components. 

The new statistic, so defined, is an estimator of the abundance of the primary 
of type a. £ q is a random variable because it is a function of N random vari- 
ables, £ g = £, q (q1 ■ ■ ■ q% a , q\ ■ ■ ■ q b N )• Therefore, its mean value as a function of 
the composition is given by, 



(Q(c a ) = Jdq a 1 ... dq%dq\ . . . dq b N>> CM . . . q\ . . . q b Nb ) X 

fa{q^)---fa{<t Na ) (3) 

where c a = N a /(N a + Nf,) is the composition (or the abundance) corresponding 
to the primary a for the sample of N events. Noting that /a (9) are probability 
density functions and, therefore, that / dq f a (q) = 1, the dependence of (£ q ) 
on c a can be explicitly seen after integrating Eq. [3l 

(Q(c a ) =m c a + d, (4) 
where, 



m = J dqP a (q)(f a (q)-f b (q)), (5) 
d = J dqP a (q)f b (q). (6) 



Equation shows that the mean value of £ g increases linearly with the com- 
position of the samples and from equations (!5ll6|) we see that as the overlap 
between the distributions f a (q) and fb(q) decreases, (£ g )(c ) tends to the iden- 
tity function. We also see that (£ 9 )(l/2) = 1/2, independent of the shape of 
the distributions f a (q) and fb(q)- 
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The variance of £ is given by, 

Var[Q{c a ) = 1 [c a (a 2 a [P a (q)} - a 2 b [P a (q)}) + ^ b [P a (q)]\ , (7) 

where, 

°A[Pa(q)] = J dqP'(q)f A (q) ~ (/ dqP a {q) f A (q)^ . (8) 

Therefore, the variance of £ also increases linearly with c a . Note that it is pro- 
portional to N^ 1 , i.e., it approaches zero as the size of the sample approaches 
infinity. 

The composition estimator £ 9 is defined for each parameter q, and therefore 
a covariance can be calculated for any two of such estimators, say g M and q v . 
Let /a(?/j, qv) be the distribution function of these variables and 



/m(?m) = / d( b> U(ln, Qv), (9) 

fuA{Qv) = J dq^ f A (q^ q v ), (10) 

p ( a ) = UM)_ fill 

P,a(q)= f ( f :i q ] , y (12) 

fua(q) + fub{q) 
The covariance between and £ 0j/ is given by, 



Cov(^,£ qi/ ) = ^[c a J dq^dq u P^a{q^)P V a{qu) (faiq^Qu) - 

ffta{qii)fva{qv)) + (1 - C a ) / dq^dq u P^aiqn) Pva{qv) X 

(fb(q»,qu) ~ f^b{q^)fub{qu)) ■ 



(13) 



From Eq. (JIB"]) we can see that Cov(£ q , £ J is also a linear function of c a , and a 
sufficient condition to be zero is that the variables q^ and q^ are independent. 

The parameter £ is the sum of N random variables, therefore, for large enough 
values of N, this variable follows a Gaussian distribution because of the central 
limit theorem. 

As a simple example of Eq. (J4]), let us consider two Gaussian distribution 
functions of mean values +x and — x and a — 1. Under this assumption, the 
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Fig. 1. Slope and intercept of (£)(c a ) as a function of n for two Gaussian distributions 
with £7 = 1. 

parameter 77 = \(qi) — (q 2 ) \ /[a 2 {qi] + c 2 ^)} 1 ^ 2 , which measures the discrim- 
ination power of q, can be easily calculated and takes the values 77 = x/y/2. 
Fig. [Q shows the corresponding slope and the intercept of (£)(c a ) as a function 
of T]. We see that, as 77 increases, the intercept approaches zero and the slope 
approaches one, i.e., as expected, (£)(c a ) tends to the identity function. 



3 Simulations and Composition Analysis 



3. 1 Shower and detector simulations 



In order to perform composition analyses around E — 1 EeV (EeV = 10 18 
eV), including the effect of the energy uncertainty, we generated a library of 
atmospheric showers by using the AIRES package version 2.8.2 [32]. We used 
a relative thinning of 0.6 and statistical weight factor 0.2 (see Ref. [32] for 
details). For simplicity, and to a good approximation, we assume that the 
showers follow a power law energy spectrum with spectral inde:?! 2 ! 7 = —2.7 in 

2 The impact of the exact value of 7 in our results is negligible. 
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the energy interval [0.6, 2] EeV. We generated eight sets of 166 showers each 
corresponding to two kinds of primaries, protons and iron nuclei, two values 
of zenith angle, 9 = 30° and 9 = 45°, and two different models of high energy 
hadronic interactions, QGSJET-II and Sibyll 2.1. 

Each of these showers was used to generate events of the AMIGA-HEAT 
detector system. The simulation of the AMIGA detectors was performed with 
a dedicated package described in Ref . [3T] • Each shower was used 40 times by 
distributing impact points uniformly in the 750 m-array. We assumed 30 m 2 
muon detectors segmented in 192 cells and buried 2.5 m underground |31j . 
We used a time binning of 20 ns and an efficiency of each segment equal to 
one. The shower arrival directions and core positions were reconstructed with 
the standard CDAS package Er-v3r4 [33] specifically developed to reconstruct 
the Cherenkov detector information in Auger. To reconstruct the muon lateral 
distribution function, we used the method introduced in Ref. [3~T] . 

The effects of the response of the HEAT telescopes and the reconstruction 
procedure for the longitudinal profile in the determination of X max were in- 
cluded using the approach of Ref. [31]. For each simulated AMIGA event, 
corresponding to a given shower, we obtained a reconstructed X max by taking 
a random value from a Gaussian distribution of mean value equal to the one 
calculated internally in AIRES (obtained by fitting the longitudinal profile 
with a Gaisser-Hillas function) and a, for the corresponding energy, obtained 
from the interpolation of the simulated data given in Ref. |27j. Note that, 
despite the fact that the fluctuations in X max have been included in a realistic 
way, the same cannot be said of the possible correlations between the fluctu- 
ations in fluorescence and surface parameters for hybrid events. Nevertheless, 
we expect that such correlations could be dealt with by proper quality cuts 
without affecting our main conclusions. 

In short, for each simulated event, we obtained all the parameters with the 
corresponding fluctuations of the Cherenkov, muon and fluorescence detectors. 
We obtained 8 sets of N = 166 x 40 = 6640 events each (depending on 
the reconstruction efficiency). To refer to each set, we will use the notation 
S(9,A,h) where 9 = 30°, 45°, A =Proton, Iron and h =QGSJET-II, Sibyll 
2.1. 



3.2 Probability density function estimation 

In order to calculate £ (see Eq. ([1])), we need the distribution functions, /aIq), 
of the different parameters sensitive to the primary mass considered, includ- 
ing the effects of the detectors and reconstruction methods. We use the non- 
parametric method of kernel superposition [3"4|35f36ll3"T] as an estimate of these 
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probability density functions from the simulated data. 



We assume a Gaussian error of 25% in order to include the uncertainty in 
the determination of the primary energy, which has an important effect in 
the discrimination power of A^(600). Therefore, we consider samples of events 
obtained in the following way: for each simulated event, belonging to a given 
set S(9, A, h), of real energy E, we estimate the reconstructed energy by taking 
a random value from a Gaussian distribution of mean E and a = 0.25 x E. 
Since the bias introduced by the rapid fall of the spectrum shifts the center 
of the bin from 1 EeV to 1.14 EeV (see Appendix B), if the "reconstructed" 
energy falls in an energy interval of width 0.5 EeV centered at 1.14 EeV, 
the event is added to a new sample, Su r (0,A,h). We repeat this procedure 
10 times for each set S(9, A, h). Therefore, we obtain 10 samples Sjj r (9, A, h) 
with i = 1 ... 10. 

The parameters used in the analysis are defined as follows: (i) iV M (600) is the 
number of muons at 600 m from the shower core, which is estimated from a fit 
to the lateral distribution of the number of muons measured by the AMIGA 
muon counters [31], (ii) R is the curvature radius of the shower front, (iii) 
(3 is the slope of the lateral distribution function of the signal in the water 
Cherenkov detectors and (iv) t\/2 is a parameter constructed with the rise-time 
of the signal in a selected subset of the triggered water Cherenkov detectors, 



where Nt is the number of stations with signal greater than 10 VEM (Vertical 
Equivalent MuorJZI), t\ and t 50 are the times at which 10% and 50% of the 
total signal is collected, respectively, and is the distance from the z-th station 
to the shower axis. Only stations at a distance to the shower axis greater than 
400 m are included in Eq. (1141) . 

For each obtained sample (9, A, h) we calculate the probability density esti- 
mate corresponding to four pairs of parameters: (Ar M (600), A max ), (iV M (600), /?), 



Better estimates of the density functions are obtained using the adaptive band- 
width method introduced by B. Silverman [31]. We perform a first estimation 
of each density function by using a Gaussian kernel with a fixed smoothing 



3 The signal deposited in a water Cherenkov tank when fully traversed by a muon 
vertically impinging in the center of the tank [38] . 




(14) 



(A^(600),t 1/2 ) and (^(600), fl). 



9 



parameter, 



N 



VI 2vr hi i=i 



N 

J2 ex p 



X 



XiW-Hx 



Xi 



2hl 



(15) 



where x is a two-dimensional vector of one of the pairs of the parameters 
considered, N is the size of the sample, V is the covariance matrix of the data 
sample and h = 1.06 x TV -1 / 6 is the smoothing parameter corresponding to 
Gaussian samples. The latter is used very often in the literature [39J because 
it gives very good estimates even for non Gaussian samples. From the density 
estimate obtained by using Eq. (fl5l) we calculate the parameters, 

-1/2 

(16) 



fo(xi 



(nf =1 / (%: 



l/N 



and then we obtain the density estimate from, 



i 



Nj\V\ 2tt 



N 

E 

i=l 



h! exp 



(x — X:) T V l (x 



2h 



(17) 



where hi = h Aj. 

Fig. [2] shows the average over the 110 density estimates corresponding to the 
parameters X max and A^(600) for protons, 9 = 45° and QGSJET-II as the 
hadronic interaction model. 

Every estimate f(x) has an associated uncertainty because it is constructed 
from a finite set of iV elements. To take into account this uncertainty in the 
composition determination we use the smoothed bootstrap technique [34] . For 
each f(x) we obtain 10 different samples, of the same size used to obtain f(x), 
by selecting random two-dimensional vectors from it. To every bootstrap sam- 
ple we perform the same procedure done to the the original sample used to ob- 
tain f(x). Therefore, for every primary type, zenith angle, hadronic model and 
pair of parameters we obtain 110 estimates of the corresponding distribution 
function. Note that the smoothed bootstrap technique just allows us to esti- 
mate the variance of each f(x) but not the bias (Bias[f(x)] = E[f(x)] — f(x)), 
which we assume is of negligible importance because we are using adaptive 
smoothed parameters. 

Fig. [3] shows the mean value and the one sigma region for the marginal distri- 
butions (see Eq. [9]) of the parameters X max and A^(600) for protons and iron 
nuclei, 9 = 45° and QGSJET-II and Sibyll 2.1 as the high energy hadronic 
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Fig. 2. Average distribution function for the parameters X max and A^(600) cor- 
responding to protons of 9 = 45° and QGSJET-II as the high energy interaction 
model. 
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Fig. 3. Mean value and the one sigma region corresponding to the marginal distri- 
bution estimates of X max and A^(600) for 6 = 45°, protons and iron nuclei and 
QGSJET-II and Sibyll 2.1 as the high energy hadronic interaction models. 



interaction models. From this figure we see that, on average, QGSJET-II pro- 
duce more muons than Sibyll 2.1 and that the difference between the (X max ) 
of protons and iron nuclei is smaller for QGSJET-II. 
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3.3 Composition determination 



The method presented here is statistical. In principle, we want to infer from a 
data sample the average composition of the cosmic rays assuming a mixture of 
proton and iron nuclei, i.e., < c p < 1. We consider samples of iV = 100 and 
iV = 1000 events which are the number of hybrid and surface detector events 
in the energy interval considered, respectively, expected for the 750 m-array 
in two years of data taking [40J. iV = 1000 is also the sample size of the hybrid 
events that fall in the energy interval under consideration for the life time of 
Auger, ~ 20 years. 

For each pair of mass sensitive parameters, zenith angle, hadronic interaction 
model and for each value of c p from to 1 in steps of Ac p = 0.1, we sample the 
average proton and iron distributions to generate 1000 independent samples 
of A^ = 100 events and 300 independent samples of A^ = 1000 events. 

For each of these samples, corresponding to a given pair of parameters (qi, 52), 
we calculate £ gi and £, q2 from, 



£*'( c ) = — 



Nc p N(l-cp) 
i=l i=l 



where v — 1,2 and 

P P M M = fu ; |# , , • (10) 



Here fp^{qu) is the kth marginal distribution function for protons correspond- 
ing to the parameter q u calculated from the kth estimate /p (<7i, (fe)- ffeviVv) 
is the Zth marginal distribution but for iron nuclei. We take QGSJET-II as 
the "true" high energy hadronic interaction model, therefore, we use just the 
QGSJET-II estimates to calculate Pp l u (q u ). 

For each sample corresponding to a given pair of mass sensitive parameters, 
hadronic interaction model, zenith angle and proton abundance we obtain 
110 x 110 values of £ qv which correspond to all possible combinations of the 
marginal distribution functions of proton and iron nuclei. Finally, for each pair 
of parameters, hadronic interaction model, zenith angle and c p we calculate 
the mean values (6j 2 ), the standard deviations cr(£ 9l ), c{^ q , 2 ) and the 

covariance ccw(£ ?1 , £ ?2 ). 

Fig.|4]shows the parameters ^ and £x max as a function of the proton abundance 
of the samples for 9 = 45°, A^ = 100 and A^ = 1000 events and samples 
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Fig. 4. Mean value and one and two sigma regions for the parameters and £x max 
as a function of the proton abundance of the samples for 9 = 45°, N = 100 and 
N = 1000 and for samples corresponding to QGSJET-II and Sibyll 2.1. 

generated with QGSJET-II and Sibyll 2.1. It also shows the one and two 
sigma regions of £ g . As expected, a is much smaller for N = 1000, because, as 
explained in section [2, the variance of £ is inversely proportional to the sample 
size. 

From Fig. H] we see that the mean value of £ increases linearly with the proton 
abundance of the samples, but with slope smaller than one, which corresponds 
to the ideal case where the iron and proton distributions do not overlap. The 
slopes of and (£x max ) f° r t ne QGSJET-II samples are comparable, mean- 
ing that the discrimination power of A^(600) and X max is similar. This is 
consistent with the results of Ref . [31] , where we showed that the discrimina- 
tion power of A^(600) is considerably larger than for X max . But when we take 
the energy uncertainty into account, they become comparable since showers' 
muon content depends almost linearly on primary energy. 

For samples generated with Sibyll 2.1 (as mentioned, we take QGSJET-II 
as the reference model) we see that the behavior of £ g is quite different, in 
particular for c p = 1/2 the value of £ g is no longer 1/2. Moreover, the mean 
value of £ M , for a given value of c p , is larger for Sibyll 2.1 since, on average, 
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QGSJET-II produces more muons than Sibyll 2.1 (see Fig. [3]). Focusing on 
X max , we see that for smaller values of c p (larger values of iron abundances), the 
mean value of £x max is smaller for Sibyll 2.1. This is due to the fact that X max 
for iron nuclei is, on average, smaller for Sibyll 2.1 (see Fig. [3]). For larger values 
of c p , (£x max ) f° r Sibyll 2.1 is of order or even larger than the corresponding 
parameter in QGSJET-II. Again, this happens because the mean value of X max 
for Sibyll 2.1 corresponding to protons is slightly larger than for QGSJET-II. 
As a final result, the slope of the straight line resulting from Sibyll 2.1 is larger 
than the slope resulting from QGSJET-II. 

We found similar results for 9 = 30°. In particular, in this case, the slope of 
£x max is greater than that corresponding to which means that the discrimi- 
nating power of £x max is greater than that corresponding to A^(600). Although 
the behavior of £ M and £v max f° r $ = 30° and 9 = 45° are qualitatively similar, 
the dependence on the zenith angle is not negligible. 

As mentioned in section (2J the distribution functions of the variables £ g i and 
^ q 2 are Gaussian. The mean value and the covariance matrix depend on the 
proton abundance of the samples and therefore so will the ellipses that enclose 
regions of a given value of probability. Fig. [5] shows the ellipses corresponding 
to 68% and 95% probability for the parameters £ M and £v max f° r our cas e 
study: 9 = 30° and 45°, iV = 100 events, for samples built using QGSJET-II 
and Sibyll 2.1. The evolution of the abundance on the ^ — £x max plane, and the 
shape and size of the associated ellipses, allow for a smooth estimation of the 
composition in a way that is reasonably independent of the assumed hadronic 
interaction model. Furthermore, given two possible interaction models and an 
observed data set, a figure like the ones depicted in figure [5] can be used to 
assess simultaneously the compatibility of these models and the experimental 
data. 

Having an experimental sample, we can obtain a point on the £ M — £x max plane 
by using the density estimates obtained from simulations. This point, together 
with a diagram like the one in Fig. [5] allow for a quick evaluation of the com- 
patibility between the experimental data and the hadronic interaction models 
under consideration. Moreover, if the position of the experimental data point 
on the diagram is compatible with any of the hadronic interaction models, one 
can also obtain a rough estimation of the composition by simple inspection of 
the nearest ellipses. As an example, let us consider the top panel of Fig. [5] and 
an experimental point of coordinates (0.4,0.24). For this particular example, 
one can immediately tell form the position of the point with respect to the 
curves that the data is compatible with Sibyll 2.1 and that the composition is, 
approximately, in the interval [0, 0.04] at 68% confidence level. If, on the other 
hand, one considers a point like (0.56, 0.5) it is not possible to discriminate 
between QGSJET-II and Sibyll 2.1; however one can still estimate the compo- 
sition is in the interval [0.5, 0.6] at 68% confidence level. A point like (0.3, 0.6), 



14 



0.9 


- 






0.8 


; 

Z~ 






0.7 


: 
: 


QGSJET-II ^^Z^ 




0.6 


1 
z 






s 


z 






c 

x0.5 
















0.4 
















0.3 




; ^zzt^fe^K; 




0.2 






9 = 45° 












N = 100 


0.1 


f 1 1 1 1 1 




1 , 



0.2 



0.3 



0.4 



0.5 0.6 



0.7 



0.8 



0.9 




Fig. 5. Ellipses corresponding to 68% and 95% probability for the Gaussian distri- 
butions of the parameters £ M and £x max f° r c p S [0,1], 6 = 30° and 45°, N = 100 
events and for samples corresponding to QGSJET-II and Sibyll 2.1. 



that is located too far from the curves is inconclusive from the point of view of 
composition or hadronic interaction model, but is a strong indicative of large 
systematic errors in the detector. 

Fig. [6] shows the ellipses corresponding to 68% and 95% probability for the 
parameters £ M and £x max : ^ = 30° and 45°, iV = 1000 events, for samples 
built using QGSJET-II and Sibyll 2.1. As expected, the size of the ellipses is 
smaller than the corresponding to the case of N = 100 events which allow a 
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Fig. 6. Ellipses corresponding to 68% and 95% probability for the Gaussian distri- 
butions of the parameters £ M and £x max f° r c p G [0, 1], = 30° and 45°, N = 1000 
events and for samples corresponding to QGSJET-II and Sibyll 2.1. The points Pj 
and with i = 1, 2, 3 used as examples to illustrate the method to infer the proton 
abundance are also shown. 



more stringent test of the compatibility of experimental data with the hadronic 
interaction models. 

If the experimental point falls close to the region of the ellipses, i.e., the 
hadronic interaction models are compatible with the data, the composition 
can be estimated in a more formal way. For this purpose we perform a linear 
interpolation of the mean values and the elements of the covariance matrices 
(see Eqs. (|4|13p ). as a function of c p for each zenith angle, sample size and 
high energy hadronic interaction model considered. We then assumed a linear 
dependence in the £x max plane that links the same value of composition, c p , 
of the two hadronic models considered ("composition isolines") and choose A 
as a variable in the [0, 1] interval which takes the values and 1 for QGSJET- 
II and for Sibyll 2.1, respectively. This approach is based on the assumption 
that a continuous and smooth parametric variation between different hadronic 
models is possible on the £ M — £x max plane. 

In this way we obtain, for each zenith angle and sample size, the functions 
jl(cp, A) = ((£fj,)(c p , A), (£x max )(c p , A)) and V(c p , A). The intermediate values of 
A correspond to hadronic models for which the values of the parameters £ M 
and £x max fall in between those corresponding to QGSJET-II and Sibyll 2.1. 
Therefore, A parametrizes the composition isolines for hadronic models that 
would yield points between those corresponding to QGSJET-II and Sibyll 2.1. 

Let us suppose that for a given experimental sample with zenith angle 9 and 
number of events N we obtain £ exp = (^ p , (j^). We can estimate the proton 

abundance, c p , by solving the equation /2(c p , A) = £ ei ' p . The regions in the 
(c p , A) plane compatible with £ exp at a given confidence level are the solutions 
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Fig. 7. Regions in the c p — X space compatible with the points P, and Qi with 
i = 1, 2, 3 at 95% confidence level for samples of iV = 100 and N = 1000 events. 



of the inequality, 

(r p - Re,, a 



V-^AJ-^-^A)) <i? 2 («) 



(20) 



where R 2 (a) = — 21n(l — a) and a is the confidence level (for instance, a = 
0.68, a = 0.95, etc.). 

To illustrate the method we calculate the proton abundance of samples of 
iV = 100 and iV = 1000 events, corresponding to the points in the £ M — £x max 
space: P x = (0.36,0.2), P 2 = (0.52,0.52) and P 3 = (0.72,0.72) for 9 = 30° and 
Qt = (0.4,0.39), Q 2 = (0.56,0.46) and Q 3 = (0.7,0.65) for 6 = 45°. These 
points are depicted in the right and left panel of Fig. [61 respectively. Fig. [7] 
shows the regions corresponding to 95% confidence level obtained by using 
Eq. (f20D for iV = 100 and N = 1000 sample sizes. 

In order to obtain the c p intervals compatible, at 95% confidence level, with 
the considered points, we have to project the regions of Fig. [7| onto the x 
axis, corresponding to proton abundance. Table [1] shows the inferred compo- 
sition and its uncertainty at 95% confidence level. Note that until now we just 
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consider the composition of a given sample of cosmic rays, not the one cor- 
responding to the universe (sample of infinite number of events), to see how 
to obtain the proton abundance of the cosmic rays from the composition of a 
sample see Appendix [B] 

Table 1 

Inferred proton abundance of the samples and its uncertainty at 95% confidence 
level for the points Pi and Qi with i = 1, 2, 3 for samples of N = 100 and N = 1000 
events. 



Points c P nf for N = 100 c p nS for N = 1000 



Pi 


noi +0.076 

u.uoi_ 031 


0.031 ±0.025 


P-2 


n c:q7+0.096 
u - JO ' -0.090 


0.537 ±0.029 


P 3 


o.945±g:8» 


0.944 ± 0.033 


Qi 


0.177 ±0.095 


0.176 ±0.032 


Q2 


0-468^1 


0.469 ± 0.029 




0.90^° 


n qn fi +0.094 
u - yuo -0.062 



For 9 = 30° (P-points) we see that the uncertainty in the determination of 
the composition varies very slowly with the proton abundance of the samples, 
in particular, the error increases for values of c p closer to one. For 9 = 45° 
(Q-Points) the uncertainty varies faster, taking larger values than for 9 = 30° 
in the c p = 1 region due to the superposition of the ellipses. 

The Auger Observatory measures other composition-sensitive parameters apart 
from X max and iV M (600) by means of the water Cherenkov detectors. Although 
these parameters strongly depend on X max and A^(600), Cherenkov detectors 
(and muon counters) have 100% duty cycle, as opposed to fluorescence tele- 
scopes, which have only 10% duty cycle. This fact highlights the statistical 
value of surface parameters. For this reason we also studied the applicability 
of our method to the combination of 7X^(600) with other parameters obtained 
from the Cherenkov detectors: ti/ 2 , j3 and R (see subsection 13.21) . Fig. [S] shows 
the ellipses corresponding to 68% and 95% probability for the combination 
of £ M with £t 1/2 ,£/3 and £r. We consider samples of 1000 events which is the 
number of events expected in two years of data taking for the 750 m-array of 
AMIGA. 

Fig. [H] shows that, also in these cases, the position of the ellipses allows us to 
test the high energy hadronic models as well as to obtain the proton abundance 
of a sample, when the experimental data falls in the proximity of the ellipses. 
Again, the composition uncertainty increases in the region close to c p = 1 
especially for 9 = 45°. 

A note must be made regarding the potential effects of systematic errors on 
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Fig. 8. Ellipses Corresponding to 68% and 95% of probability for the pair of param- 
eters (^,& 1/2 ), and c p E [0, 1], = 30° and 9 = 45°, N = 1000 and 
for samples generated with the hadronic interaction models QGSJET-II and Sibyll 
2.1. 



our technique. Reconstruction biases are rather well known and, very likely, 
included in a fairly acceptable way in the reconstruction packages used. Fur- 
thermore, all the distributions use here are based on reconstructed simulated 
data. The energy bias, on the other hand, is a fundamental problem for high 
energy cosmic ray shower measurement at present and it is, certainly, a poten- 
tial problem for any technique that attempts to infer cosmic ray composition 
from extensive showers at high energies. Our technique is not an exception 
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in this respect, specially because we use muon number as one of our main 
parameters which is particularly sensitive to energy. Nevertheless, there are 
experimental and theoretical facts that can attenuate the problem, or even give 
it a potentially interesting twist: (i) the bias in energy is bounded to around 
30% or, at most, 50% and (ii) the sign of the bias is known, i.e., Monte Carlo 
simulations overestimate the energy and not the other way around, (iii) the 
difference in muon content between hadronic interaction models in the ankle 
region is about 5%, (iv) X max has a logarithmic dependence on energy. This 
means that, in the ^x max space, curves move in a predictable direction due 
to the bias and by an amount that is bounded but larger than the expected 
one due to uncertainties in the hadronic interaction model. Furthermore, the 
displacement is mostly parallel to the £ M axis, without changing appreciably 
the distance between points of equal composition along curves of constant 
hadronic interaction model. This implies that, even in the presence of an en- 
ergy bias, the position of an experimental point on this plane still carries 
significant physical information. In fact, while one would lose the capacity of 
discriminating between hadronic interaction models for large systematic bi- 
ases in energy, one could gain the capacity of experimentally constraining the 
bias with an uncertainty of the order of 5%. 

In any case, the existence of systematic errors is an important and by no means 
trivial problem, which deserves further studies and the search for more appro- 
priate strategies [H], like the modification or refinement of the parameters 
actually used in the context of the technique proposed here. Therefore, one 
alternative line of action, might be, for example, the redefinition of a muon 
content parameter in the way proposed by Hillas [42J. 



4 Conclusions 

Detailed composition studies at the energies of the second knee and ankle 
of the cosmic ray spectrum will be crucial to weight different astrophysical 
models of the Galactic-extragalactic cosmic ray flux transition. Experiments 
like Auger and Telescope Array in the near future will be instrumental on the 
later. 

In this paper we present a new statistical method to perform composition 
studies in a two dimensional space. The method was designed having in mind 
the enhancements AMIGA and HEAT presently under construction by the 
Pierre Auger Observatory, but its applicability extends to other detectors, 
like TALE, which also have hybrid capability. 

A main advantage of the method is that it minimizes the effects of the present 
uncertainty associated with the hadronic interaction models, used to simulate 
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cosmic ray showers, on the inferred composition. Furthermore, in the case in 
which systematic errors in energy are known or smaller than 5%, the method 
allows, besides the determination of the composition, an independent verifica- 
tion of the compatibility between real shower data and hadronic interaction 
models. In the case of larger systematic errors (e.g., ~ 30%), the technique 
still allows to make reliable composition estimation and, additionally, to set 
an upper limit on the size of the systematic error in energy. 

The novelty of the Auger enhancements is the combination, at energies be- 
tween ~ 10 17 and ~ 10 18 ' 5 eV, of hybrid measurement with additional simul- 
taneous, and independent muon number information. We exploit this wealth 
of data by working on a two-dimensional space £1-^2 which encodes, e.g., X max 
and information. This space has the property of clearly separating com- 
position and hadronic interaction model dependencies. Confidence levels are 
calculated in the form ellipses that enclose regions of 68% and 95% probabil- 
ity. It is also shown the way in which the £ distribution functions decrease as 
the size of the sample increases in a way compatible with the exposure time 
scale appropriate for AMIGA-HEAT. We show that, for the case in which 
the data is not dominated by systematic errors in energy, the constraints im- 
posed on the hadronic models become stronger as the exposure grows while 
the composition error diminishes to an unprecedented accuracy for astrophys- 
ical applications over the lifetime of the experiment. Therefore, besides being 
able to constrain the high energy hadronic models, it should be possible to 
determine the composition as a function of energy in the ankle region with 
errors varying from ~ 20% to ~ 5%, at the 95% confidence level, as the data 
taking progresses from 2 to 20 yr. 
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A Optimum energy bin 

The primary energy is obtained by fitting a lateral distribution function to the 
total signal in each surface station of the water Cherenkov array. This allows 
us to interpolate the shower signal at a fixed distance from the core which, 
in turn, is used as an energy estimator. This reference distance is such that 
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the shower fluctuations go through a minimum in its vicinity, and its exact 
value depends on the geometry of the array; for Auger, the reference distance 
is 1000 m for the 1500 m baseline spacing and 600 m for the AMIGA infill of 
750 m spacing. 

The signal at the reference distance is calibrated with the telescopes via hybrid 
events. The corresponding energy uncertainty for the 1500 m-array of Auger 
is ~ 20% |5j. Guided by this experimental result, we assume in this work a 
25% Gaussian energy uncertainty. 

In order to study the composition at energies of order Eq = 1 EeV, we first 
have to determine the range of reconstructed energies, centered at E r Q, with 
the form IL, = [(1 — 5)E r0 , (l + 5)E r0 \ (5 = 0.25 for 25% of energy uncertainty) 
such that the fraction of events in the interval n = [(1 — S)E , (1 + 5)E ] 
is maximum. The intervals U r and n are different because of the spectrum. 
The contamination of events of real energies smaller than 1 EeV which are 
outside n is greater than the one corresponding to energies, also outside n , 
but above 1 EeV. 

We assume that the number of cosmic ray showers with real energies between 
E and E + dE is well represented by the following power law spectrum, 

dE K ' U ' El~ x -ET V ; 



where, for every set of simulated events we have E\ = 0.6 EeV, Ei = 2 EeV, 
N Q = 6640 (depending on the reconstruction efficiency) and 7 = 2.7 (see 
subsection 13. ip . 

The number of events whose real energy belongs to il such that the recon- 
structed energy falls in n r is given by, 

(l+5)E r0 (l+5)E 

f A (E r0 )= J dE J dE' —(E')G(E,E f ), (A.2) 

(l-S)E r0 (l-6)E 



where G(E,E') = exp[-(E - E') 2 / (25 2 E' 2 )]/(V2n 5 E'). 

On the other hand, the number of events with real energy outside n whose 
reconstructed energy falls in 11,. is given by, 

(l+5)E r0 I" (1-8)E 

f B (E r0 )= J dE J dE' —(E')G(E,E')+ 

(l-6)E r0 E 1 
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Fig. A.l. Ratio of the number of events whose real energy does not belong to Ilo 
but the reconstructed energy falls in II r and the number of events whose real energy 
belongs to Hq but the reconstructed energy falls in II r , F(E t q). The minimum is 
located at E r0 ^1.14 EeV. 



Therefore, the value of E t q for which the fraction of events belonging to Ilo 
that fall in n r is maximum is obtained by minimizing the function F(E r o) = 
f b{E t q) / f a{E t q) . Fig. IA. II shows F(E r0 ) for 5 = 0.25 from which we see that it 
has a minimum but at an energy greater than 1 EeV. The minimum is located 
at E r0 = 114 EeV, then, IL = [0.86, 1.43] EeV. 



B Composition of cosmic rays from the composition of a sample 

Given a sample of N events, assuming a mixture of protons and iron nuclei, 
the number of protons in the sample follows a binomial distribution, 



dE 




dN 



(A.3) 



(1+S)E 




(B.l) 
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where C p is the proton abundance of the cosmic rays. 



If we have a sample of N events corresponding to a binomial distribution with 
no positive trials, an estimator of the parameter p of the binomial formula (C p 
in the Eq. (IB.ll) ) is given by p = n /N and the upper, p max , and lower, p m i n , 
limits of the interval which contains the real value of the parameter p with 
probability a (confidence level), are the solutions of, 



,x-=V' (R2) 

E 1 { N \lin^ ~ P^nf- n = ^ (B.3) 
n=0 W Z 



for no ytz and no ^ N . When no = and no = N the estimators of p are 
zero and one, respectively, and we can obtain, at a given confidence level a, 
an upper limit p up for the case n = and a lower limit pi ow for n = N, 




p up = l- VT^, (B.4) 
p iow = v 7 ! - a. (B.5) 



By using the method described in subsection 13.31 we can find an interval, 
[ci,c 2 ], corresponding to a given confidence level a, for the composition of a 
sample. Therefore, the number of protons of the sample, at a confidence level 
a, is contained in [m, n?\ = [Nc\, Ncq\. For the case where n\ ^ and n2 ^ N 
we can obtain the lower limit for the cosmic rays composition, C™ n , by solving 
Eq. (IB. 31) with n = n\ and the upper limit, C™ ax by solving Eq. (IB. 21) with 



ra = n 2 . For the case in which n x = 0, we just have to calculate the upper 
limit of the interval by solving Eq. (IB. 21) with no = ^2- The same happens 
for the case where n2 = N. We just have to calculate the lower limit of the 
interval by solving Eq. (IB. 31) with no = n±. The criterion adopted to obtain 
the central value for the cosmic rays composition is to take it equal to the one 
of the sample, C p = c p . 

Table IB.ll shows the central values and their uncertainty at 95% confidence 
level for the composition of the cosmic rays inferred from the composition of 
the samples obtained for the points Pi and Qi with i = 1, 2, 3 shown in table 
[TJ As expected, the uncertainty due to the finite size of the samples is more 
important for the case of iV = 100 events. 
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Table B.l 

Cosmic rays composition and its uncertainty at 95% confidence level for the points 
Pi and Qi with i = 1,2,3 obtained from the composition of the samples of N = 100 
and N = 1000 events. 



Points C p for N = 100 C p for N = 1000 



Pi 


o ns +0 - 16 

u - u,3 -0.03 


o n3i+°- 041 

U.UOl_ 029 


P2 


0.54t°;» 


n cqy+0.064 


Ps 


o-95±8:S 


o.94s±8:8S 


Qi 


18+ - 19 


0-1761°;°! 


Q2 


0.47 ±0.20 


0.469 ± 0.060 


Qs 


91+ - 10 


9 06 +0 - 094 
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